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BIOCHEMISTRY 


Multiple approaches of cellular metabolism define the 
bacterial ancestry of mitochondria 


Otto Geiger’, Alejandro Sanchez-Flores”, Jonathan Padilla-Gomez', Mauro Degli Esposti'* 


We breathe at the molecular level when mitochondria in our cells consume oxygen to extract energy from nu- 
trients. Mitochondria are characteristic cellular organelles that derive from aerobic bacteria and carry out oxi- 
dative phosphorylation and other key metabolic pathways in eukaryotic cells. The precise bacterial origin of 
mitochondria and, consequently, the ancestry of the aerobic metabolism of our cells remain controversial 
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despite the vast genomic information that is now available. Here, we use multiple approaches to define the 
most likely living relatives of the ancestral bacteria from which mitochondria originated. These bacteria live 
in marine environments and exhibit the highest frequency of aerobic traits and genes for the metabolism of 
fundamental lipids that are present in the membranes of eukaryotes, sphingolipids, and cardiolipin. 


INTRODUCTION 

Unveiling the origins of mitochondria continues to challenge 
science. While there is broad consensus that mitochondria first 
evolved 1600 million to 1800 million years ago, the critical question 
of from which bacteria they originated remains unanswered (1-13). 
Previous research has primarily relied on phylogenetic inference to 
identify the possible bacterial ancestors of mitochondria, hereafter 
called protomitochondria. However, this approach has produced 
inconsistent and varying results depending on the phylogenetic ap- 
proach, taxonomic sampling, and corrections used to reduce arti- 
facts (1-8, 12, 13). Many different alphaproteobacteria have been 
proposed to be close to protomitochondria [see (3, 5, 7, 8, 11, 13) 
and references therein]. The inconclusiveness of available evidence 
suggests that phylogenetic trees may not be adequate for identifying 
the extant bacteria that are closest to protomitochondria. This is 
likely due to the vast amount of time passed since the original sym- 
biotic event, which has diluted and dispersed the phylogenetic 
signal of contemporary bacterial proteins with respect to their mi- 
tochondrial homologs (2, 6). Differential loss of ancestral genes may 
additionally contribute to the complexity of defining the bacterial 
ancestors of mitochondria (6, 13). Moreover, the debate about 
whether the bacterial ancestor of mitochondria was an obligate or 
facultative aerobe [see (10) for a recent review] further complicates 
the evaluation of the metabolic ancestry of protomitochondria. 
Thus, new and robust evidence is needed to unveil the origins of 
mitochondria (6, 10, 12). 

To provide such evidence, we introduce here alternative ap- 
proaches with different sources of biases (table S1), covering 
aerobic and anaerobic metabolic traits shared by bacteria and mito- 
chondria (6-11). One of such traits contemplates the enzymes in- 
volved in the metabolism of cardiolipin, a typical prokaryote 
phospholipid that is present only in the mitochondrial membranes 
of eukaryotic cells (14). The guiding principle of our strategy is that 
the creation of the first eukaryotic cell involved genomic transmis- 
sion of metabolic traits from a bacterium that could have surviving 
descendants today. Although the transmission has been a rare, if not 
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singular event (1, 3, 6, 10), it might have left vestigial traces in the 
genome of some of those descendants—similar to “missing link" 
features found in other major transitions in evolution. An 
example of evolutionary traces of this kind is the synteny of two 
genes of cytochrome c oxidase (COX or complex IV) (11, 15), the 
mitochondrial enzyme which ultimately consumes oxygen in our 
cells (Fig. 1 and fig. S1). Seven genes for COX subunits and acces- 
sory proteins form a conserved genomic cluster (operon) that is 
characteristic of alphaproteobacteria (11). Four of these genes are 
encoded in mitochondrial DNA (mtDNA) of early-branching uni- 
cellular eukaryotes (11, 15-17), while two others are present in mi- 
tochondrial complex IV of the protist Tetrahymena (Fig. 1) (18). 
Moreover, the gene for the assembly protein Cox11 
(Cox11_CtaG) always precedes the gene for cytochrome oxidase 
subunit III (COX3; Fig. 1), forming collinearity that is conserved 
in the mtDNA of some protists (fig. $1). The Cox11-COX3 
synteny can thus be considered a genomic relic of the aerobic an- 
cestry of protomitochondria, providing a selection criterion for pu- 
tative bacterial relatives of mitochondria (11). Its absence in the 
genome of many bacteria, including the Rickettsiales often consid- 
ered relatives of mitochondria (4, 7, 15), would exclude such pro- 
karyotes from the ancestry of protomitochondria. Here, we 
present diverse new approaches that confirm this exclusion and in- 
dicate that the ancestor of protomitochondria was likely related to 
marine alphaproteobacteria never considered before for the evolu- 
tion of mitochondria. 


RESULTS AND DISCUSSION 

Unveiling a new synteny in complex III genes 

We searched for other genomic traces with equivalent discriminat- 
ing power as the Cox11-COX3 synteny, focusing on the possible 
genomic association of two proteins that are structurally and func- 
tionally interconnected in complex III: the mitochondrial process- 
ing protease (MPP) and the Rieske iron-sulfur protein (ISP; Fig. 1). 
Mitochondrial complex III (ubiquinol:cytochrome c reductase) 
derives from the bacterial cytochrome bc, complex, which is 
encoded by the petABC operon now represented by the cytochrome 
b gene in mtDNA [Fig. 1, cf. (15, 19)]. In plants and protists, two 
MPP proteins form a large domain of complex III structure, not 
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Fig. 1. Evolution of the cytochrome oxidase gene cluster from proteobacteria to mitochondria. (A) The figure illustrates the gene clusters (operons) of cytochrome c 
oxidase (complex IV) rendered with Roman mosaic tiles to indicate the mosaic nature of the whole mitochondrial proteome (3). The archetype rus COX operon (58) 
includes an ancestral form of both heme A synthase (CtaA0) and transmembrane CtaG (CtaG_caa3). These genes are retained in the genome of nitrifying Nitrococcus 
(second row in the illustration) but have been subsequently lost (58). The transition from the rus operon to the COX operon of Nitrococcus included the acquisition of two 
additional transmembrane helices (TM) at its N terminus (78, 20, 22, 58). These extra TM may derive from those present at the N terminus of ancestral COX1 (58), as 
indicated by the dashed green arrow on the top of the illustration. The gene for the M16.06 group of M16 zinc peptidases (23) is frequently associated with the end 
of alphaproteobacterial COX operons, often with the gene for threonine synthase (Tsy). Potential homologs of bacterial DUF983 proteins and two SURF 1 isoforms are part 
of Tetrahymena Complex IV (18). The last row of the illustration includes also a protein from the filarial Onchocerca containing the collinear fusion of M16B with iron-sulfur 
protein (ISP) (accession OZC11663) encoded in the nuclear DNA of eukaryotes (N symbol). Similar fused proteins have been found in two other nematodes. (B) Genomic 
positioning for the M16B genes in various bacteria. (C) Reassembled metagenomic contig829 for alpha J134 (67), a metagenomic-assembled genome (MAG) that clusters 
with lodidimonas in phylogenetic trees (fig. S6). The red arrow on the left indicates the %GC profile of the reassembled DNAs obtained with the program Proksee (https:// 
proksee.ca/). An equivalent gene cluster is present in the MAG, Rhodothalassiaceae bacterium KatS3mg119, which has been sequenced with long reads (75) and clusters 
with lodidimonadales (see Materials and Methods). 


Geiger et al., Sci. Adv. 9, eadh0066 (2023) 9 August 2023 2 of 17 


EZOZ ‘r0 19401909 UO S1O‘a0UNTOS' MM M//:sdyYy Wo popeojumoq 


SCIENCE ADVANCES | RESEARCH ARTICLE 


Table 1. List of the aerobic traits considered for the analysis in Fig. 2. 
AOX, alternative oxidase. 


Protein and Premium Penalty Considered 
defining trait for synteny* for earlier 
absence 
cyt C No, this work 
cytochrome c 
COX1 COX = +2 with COX2 -1 Yes 
complex IV 
COX2 COX - —1 Yes 
complex IV 
COX3 COX - +2 with COX11 Yes 
complex IV 
Cox11 COX Yes 
assembly 
Cox15 type2, Yes 
heme A synthesis 
SURF1 COX Yes 
assembly 
SCO COX Numeral as its Yes 
assembly multiple genes’ 
Zf-CHCC No, this work 
precursor 
subunit Vb 
M16B precursor +2 with ISP 0 if not No, this work 
of MPP close 
to COX 

Cbp3 bc1 No, this work 
assembly 
ISP bel = Yes 
complex III 
CytB bc1 = Yes 
complex III 
CytC1 bel - Yes 
complex III 
CcmF cyt C +1 with two other Yes 
biosynthesis Ccm genes 
CcmE cyt E No, this work 
biosynthesis 
CcmA cyt C No, this work 
biosynthesis 
AOX Numeral as its Yes 
alternative multiple genes’ 
oxidase 
M16A fused zinc Capped numeral No, this work 
peptidases for 

multiple genes” 
Cys near C No, this work 


terminus COX1 


*Numerical premium given to collinear synteny, as described in the 
Materials and Methods. tA few alphaproteobacterial genomes have 
multiple genes for SCO proteins, while some eukaryotes have multiple AOX 
genes. #Several eukaryotes have up to eight M16 (A, B, and C 
subfamily) peptidases, but their maximal numeral was capped at five as 
described in Materials and Methods. 
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present in bacterial bc; complex (18, 20, 21). In animal mitochon- 
dria, MPP derivatives called core proteins (CPs) have the same 
structural organization (19, 22), while the MPP heterodimer is a 
separate soluble enzyme (20). We have confirmed two genes for 
MPP proteins but hardly any for CP proteins in the genomes of 
Rhodophyta, Discoba, and other single-cell eukaryotes (table S2). 
Hence, it is highly likely that MPP proteins were constitutive com- 
ponents of the first mitochondrial complex III, as seen in Tetrahy- 
mena (18). Notably, CPs retain the function of processing the 
presequence of ISP (19, 22), thereby underlying the intimate con- 
nection between MPP and ISP. Following the finding of a filarial 
protein corresponding to the fusion of MPP with ISP (Fig. 1A), 
we systematically searched the genomes of currently available bac- 
teria for the contiguity of genes encoding the bacterial homologs of 
MPP and ISP. 

The bacterial homolog and likely precursor of MPP are a zinc 
peptidase belonging to a specific group of the M16B subfamily 
(23). While the gene for M16B is isolated in gammaproteobacteria 
and early-branching alphaproteobacteria such as Rickettsiales, it is 
often associated with that of threonine synthase (Tsy) in other al- 
phaproteobacteria. These two genes appear to progressively 
migrate close to the COX operon in the genome of various alphap- 
roteobacteria, becoming attached to the gene of Surfeit locus 
protein 1 (SURF1) ending the same operon in a number of taxa 
(Fig. 1). This genomic contiguity is intermixed with the gene of car- 
boxypeptidase M32 (abbreviated as M32 in Fig. 1) in Rhodospiril- 
lales. Only in Iodidimonadales, however, the M16B genes are found 
close to, or directly associated with, the petABC operon of the bc, 
complex (Fig. 1, A and C). This direct association is present in the 
genomes of Jodidimonas spp. (Fig. 1), as well as in related alpha- 
proteobacteria Q-1 (table S3). Such taxa belong to the order Iodidi- 
monadales that is part of the group of Sneathiellales, Emcibacter- 
ales, Rhodothalassiales, Iodidimonadales, and Kordiimonadales 
(SERIK) (2). Detailed searches of currently available genomes 
failed to retrieve genomic associations equivalent to those found 
in cultivated taxa of lodidimonadales (Fig. 1, A and B), except for 
metagenomic-assembled genomes (MAGs) that cluster with Iodidi- 
monadales as alphaproteobacteria bacterium J134 (alpha J134; 
Fig. 1C). 


Distribution of aerobic traits in mitochondria and 
alphaproteobacterial lineages 

The rare M16B-ISP synteny (Fig. 1) would represent a novel trace of 
the metabolic ancestry of protomitochondria or may derive from 
some unusual genomic streamlining. To discriminate between 
these possibilities, we followed different approaches, focusing first 
on the central part of the respiratory chain pivoting on cytochrome 
c (24). This part of the respiratory chain defines the aerobic metab- 
olism of mitochondria, which must have been crucial in the envi- 
ronmental adaptation of the first eukaryotes (3, 10); it is switched 
off in eukaryotes adapted to anoxia [see (10, 25) and references 
therein]. Three operons contribute to cytochrome c biogenesis 
and function in bacteria, and the majority of their genes, together 
with a few isolated genes for the assembly of the respiratory com- 
plexes such as Cox15 (heme A synthase), are present in eukaryotes 
(24). Overall, the number of the shared genes for the central part of 
the respiratory chain is 25, including soluble cytochrome c (see Ma- 
terial and Methods for details). We considered each of these genes 
as an individual trait contributing to the aerobic metabolism of 
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Alphaproteobacteria lineage 


Fig. 2. Distribution of aerobic traits scores along alphaproteobacterial lineages and mitochondria. The figure presents the lineage-dependent plot of the cumu- 
lative scores for the set of 20 aerobic traits listed in Table 1. The mean values for each lineage follow the branching order of such alphaproteobacteria lineages along the x 
axis (except for the mitochondria on the left). Asterisks indicate the distribution values that are not significantly different from that of aerobic mitochondria (P > 0.1 with 
the 99% confidence t test; table S4). Note that the lineage of MarineProteo1 lacks the A1 type COX1-3 proteins that are shared by alphaproteobacteria and mitochondria. 
The figure presents the lineage-dependent plot of the cumulative scores for the set of 20 aerobic traits listed in Table 1. The mean values for each lineage follow the 
branching order of such alphaproteobacteria lineages along the x axis (except for the mitochondria on the left). 


bacteria and mitochondria and computed their cumulative distribu- 
tion in different combinations (Table 1 and fig. S2A). We then 
settled on a representative set of 20 traits including the bacterial 
Zn-finger precursor of Cox5B (26) and a cysteine signature in a con- 
served C-terminal region of catalytic COX subunit 1 (COX1) (27). 
The latter trait was chosen for its common presence in alphaproteo- 
bacteria with reduced genomes including Rickettsiales (table S3). 
We also considered the genes for M16A peptidases, which have 
multiple homologs in eukaryotes (Table 1) (28). These and other 
traits have not been considered before in relation to eukaryogenesis 
(1-8, 15), as indicated in Table 1. 

Figure 2 presents the lineage-specific distribution (table $3) of 
the chosen set of 20 aerobic traits in quantitative terms, showing 
an apparent peak around Iodidimonadales. This peak depends 
only partially on the premium given to the presence of the M16B- 
ISP synteny and is present in all other combinations of aerobic traits 
(fig. S2). The cumulative aerobic traits score for Iodidimonadales 
does not substantially differ from aerobic mitochondria (Fig. 2 
and table S4). Sneathiellales and related new clades of marine taxa 
(2), Kordiimonadales, Rhizobiales, Sphingomonadales, and Caulo- 
bacterales also showed a distribution of aerobic traits comparable 
with that of mitochondria (Fig. 2 and table S4). In contrast, the lin- 
eages of MarineProteol, MarineAlpha, Rickettsiales, Holosporales, 
Pelagibacterales, and Rhodobacterales have significantly lower 
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cumulative scores than aerobic mitochondria (Fig. 2), suggesting 
their exclusion from the aerobic ancestry of protomitochondria. 
However, it is difficult to discriminate which alphaproteobacterial 
lineage may display the best match of the aerobic metabolism of mi- 
tochondria. Hence, different approaches are needed to further filter 
alphaproteobacteria lineages for identifying the most likely bacterial 
ancestor of protomitochondria (Tables 1 and 2, see also table S1). 


Distribution of genes for ceramide and kynurenine 
biosynthesis in alphaproteobacteria 

Our next approach was based on a completely different metabolic 
pathway that has hardly been considered before in regard to eukar- 
yogenesis: the biosynthesis of ceramide-based lipids, sphingolipids. 
Sphingolipids constitute a vast class of membrane lipids that are 
ubiquitous in eukaryotes but scarcely present in bacteria (29, 30). 
To produce the precursor of ceramide, bacteria often require a 
four-gene operon ending with the gene for an a-oxoamine synthase 
catalyzing the key step of ceramide biosynthesis: serine palmitoyl- 
transferase (SPT) [Fig. 3A, cf. (29-31)]. We found the spt gene en- 
coding SPT in Iodidimonadales and other members of the SERIK 
group, as well as in some Rhodospirillales and a few Rhizobiales 
(Fig. 3 and table S5). Our phylogenetic analysis indicates that Odys- 
sella sp. NEW MAG-112 may have the earliest enzyme for ceramide 
biosynthesis of all  alphaproteobacteria (Fig. 3B). 
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Table 2. Discriminatory criteria to evaluate which alphaproteobacteria may be close to protomitochondria. Taxa were selected from those that presented 
with cumulative scores of aerobic traits above 20, thus overlapping the distribution values of aerobic mitochondria (Fig. 2), both types of cardiolipin synthase (Fig. 
3E) and a collinearity bloc of ribosomal proteins (RP) encoded in the mtDNA of Jakobida (15)."Yes" indicates that a taxon has passed the discriminatory criterion 
indicated in a given column. Taxa are listed at the genus level when other species of the same genus passed equivalent criteria. Rhodothalas. MAG indicates 


Rhodothalassiaceae bacterium KatS3mg119 (75), an early-branching lodidimonadales. 


Alphaproteobacteria Both RP NO SPT >3 Top M16B- 
taxon Cls types synteny* INDELS* & KynU anaerobic traits hits MPPbeta ISP synteny 
Tistrella Yes Yes 


Aestuariivirga litoralis 


*Major collinearity bloc of RP proteins and RNA polymerase shared with Jakobida mtDNA (15). 
+MAG largely incomplete, even after reassembling (see Materials and Methods). 


homologs (77). 
quartile of mitochondria. 


Alphaproteobacterial SPT appears to be the ancestor of both iso- 
forms of eukaryotic SPT (Fig. 3B), as well as the SPT of nitrifying 
taxa such as Nitrococcus (fig. S3A). Such nitrifying bacteria have in- 
tracytoplasmic membranes resembling mitochondrial cristae, as in 
alphaproteobacterial methanotrophs [Methylocystaceae (32, 33)] 
that also have SPT (fig. S3A). Ceramide is well known to modulate 
the curvature and shape of lipid bilayers (34); therefore, it may be 
crucial for the formation of bacterial intracytoplasmic membranes. 
The genomic distribution of spt and its partner genes shows a 
maximum within the SERIK group, besides the expected high fre- 
quency in Sphingomonadales (Fig. 3C). The limited presence of the 
same genes in different lineages probably derives from events of 
lateral gene transfer (LGT) since they are present only in a few 
taxa that have SPT proteins clustering with those of other alphap- 
roteobacteria, as in the case of Caulobacter (fig. S3A). We also found 
that SPT distribution often matches that of the kynureninase gene 
kynU (table $5), another pyridoxal 5’-phosphate—dependent 
enzyme defining the kynurenine pathway for NAD(P)* [nicotin- 
amide adenine dinucleotide (phosphate)] biosynthesis in bacteria 
(35). An equivalent pathway is required for de novo synthesis of 
rhodoquinone (RQ) in nematodes and other eukaryotes, which 
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tNo conserved INDELs in COX3 and ISP as in mitochondrial 
§Taxa with cumulative aerobic traits below the second 


use this quinone in their adaptation to low levels of oxygen (36) 
(see below). The kynU gene is generally associated with kynA en- 
coding tryptophan dioxygenase, the upstream enzyme of the kynur- 
enine pathway (35, 36). This genomic association is present in 
several taxa of the SERIK group but absent in Caulobacterales 
(table S5). Hence, the SPT-kynureninase combination produces a 
stringent criterion for discriminating alphaproteobacterial lineages 
from the ancestry of protomitochondria, which must have had both 
traits now present in a variety of eukaryotes. 


Distribution of the enzymes for the metabolism of 
cardiolipin 

Cardiolipin or diphosphatidylglycerol, as it is traditionally called in 
microbiology literature, is a dimeric phospholipid that provided one 
of the first hallmarks for the bacterial origin of mitochondria (14, 
37). In eukaryotic cells, it is synthesized in mitochondria and re- 
modeled with the participation of extramitochondrial enzymes 
but normally resides in the inner mitochondrial membrane (38, 
39). In prokaryotes, cardiolipin and its various derivatives are con- 
stituents of the cytoplasmic membrane, often fulfilling essential 
roles in viability (40). Despite the well-known bacterial origin of 
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Fig. 3. Genomics and phylogenetic trees of bacterial and eukaryotic SPT, KynU, and cardiolipin enzymes. (A) Representation of the four-gene operon comprising 
SPT found in proteobacteria, for example, Caulobacter (29, 31). (B) Phylogenetic maximum likelihood (ML) tree of SPT from alphaproteobacteria and various eukaryotes, 
which have two isoforms (30): the catalytic LCB2 and the inactive LCB1. Paralog proteins of 5-aminolevulinate synthase are used as outgroups providing the root of the 
tree. The alpha MAG originally named Odyssella sp. NEW MAG-112 (GCA_016792765.1) is not a member of the Holosporales but of the order o_Bin65 according to GTDB 
taxonomy (65). It is included here in the lineage of Emcibacterales and Zavarziniales for similar INDELs profiles (tables S3 and S4) and therefore labeled as “Odyssella.” 
Numbers indicate the strength of the nodes in percentage values of ultrafast bootstraps. (C) Frequency distribution of the combination of the spt and kynU genes, 
irrespective of their surrounding genes, along the alphaproteobacterial lineages. Note that the majority of the eukaryotic taxa used for the analysis in Fig. 2 do have 


SPT proteins, while the distribution of the kynU gene is scattered (36). (D) Phylog 


enetic ML tree of various CDP-alcohol Transferases (CD-AT) proteins. (E) Frequency 


distribution of the enzyme for cardiolipin metabolism, cumulatively indicated as CL-related enzymes. Blue histograms represent the distribution data of the presence 


of both types of cardiolipin synthase (Cls) in the same genome, while the brownish 
and relatives of Cld1, cardiolipin-specific lipase of yeast (table S5) (39). 


cardiolipin, the evolution of its metabolism is quite complex and 
thus challenging because of the presence of multiple pathways for 
cardiolipin biosynthesis in different organisms (14, 37, 40, 41). The 
so-called “bacterial” pathway using an enzyme related to phospho- 
lipase D (40), ClsA or Cls_pld, is also present in Kinetoplastida (42) 
and other eukaryotes (14, 37). Conversely, the typical “mitochon- 
drial” cardiolipin synthase belongs to the superfamily of cytidine 
5'-diphosphate—alcohol transferases [CDP-AT; cf. (43)]. This 
enzyme is widespread among bacteria too (14) and is structurally 
very similar to another member of the same superfamily, PgsA, cat- 
alyzing the biosynthesis of phosphatidylglycerol-phosphate (PGP; 
Fig. 3D and fig. S4A) (43, 44). 

The taxonomic distribution of the CDP-AT type of cardiolipin 
synthase, Cls-AT, is not homogenous among Opistokhonts, the eu- 
karyotic group spanning Amoebozoa and metazoa (45). While 
metazoans and fungi have it, other members of the super-group 
such as Amoebozoa have the Cls_pld type instead (14, 37, 41). Con- 
versely, we found that the genome of Techamonas, a unicellular 
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histograms represent the distribution data of the same Cls proteins plus that of PgsA 


biflagellate basal to the super-group of Opistokhonts (45), contains 
both types of cardiolipin synthase—KNC46788 for Cls-AT and 
KNC55721 for Cls_pld (fig. $4, B and D). A similar dual presence 
of the different cardiolipin synthases has been previously reported 
for two species of Stramenopiles belonging to the phylum Bigyra of 
the Stramenopiles, Alveolates and Rhizaria (SAR) super-group (41). 
We found two other species of Bigyra that have both types of car- 
diolipin synthase, i.e., Hordea fermentalgiana and Bicosoecida sp. 
CB-2014. The sequence of the Cls_pld type of the latter organism 
does not have a conserved Lys in the first HisLysAsp (HKD) motif of 
catalytic residues in phospholipase D enzymes (fig. S4). Mutagene- 
sis of this invariant Lys produces a total loss of activity (46); conse- 
quently, the Cls_pld enzyme of Bicosoecida sp. CB-2014 is likely 
inactive. The genome of Bicosoecida sp. CB-2014 may thus repre- 
sent an evolutionary transition from the ancestral presence of both 
different types of cardiolipin synthases and the current dominance 
of the Cls-AT type among Stramenopiles and many other eukary- 
otes (41). 
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The simplest explanation for the dual presence of different 
enzymes for cardiolipin biosynthesis in some eukaryotes, and the 
scattered distribution of either type in different eukaryotic groups 
(14, 41), would be that the bacterial ancestor of protomitochondria 
had both types of cardiolipin synthase too. Differential loss from the 
ancestral state having both types of enzymes would then rationalize 
the complex distribution among eukaryotes. In support of this pos- 
sibility, we found the dual presence of the different cardiolipin syn- 
thases in the genome of some alphaproteobacteria, with maximal 
frequency in the lineages of Iodidimonadales and Rhizobiales 
(Fig. 3E and table S6). While MAGs clustering with Iodidimona- 
dales such as alpha J067 have a complete Cls-AT and a standard 
Cls_pld (table S6), Iodidimonas spp. has a different subtype of 
CDP-AT proteins with long N-terminal extensions that may func- 
tion as the Cls-AT of other taxa, despite its overall similarity with 


the distant CDP-AT AF2299 (43). The Iodidimonadales proteins 
with N-terminal extension may represent an ancestral relative of 
the Cls-AT type of cardiolipin synthase since they form an early- 
branching clade in the phylogeny of both the bacterial and mito- 
chondrial enzymes, as well as PgsA (Fig. 3D). They display local se- 
quence similarities with conserved signatures of Cls-AT that are 
much higher than with other members of the CDP-AT superfamily 
(see Materials and Methods for details). Proteins closely related to 
those of Iodidimonadales are present in a few Rhodospirillales, for 
example, Niveispirillum, and various species of Caulobacter, though 
cardiolipin lipids seem to be absent in these taxa, including the most 
studied Caulobacter vibroides (formerly Caulobacter crescentus) 
(47). Consistent with biochemical data, our genomic analysis indi- 
cates that enzymes of cardiolipin metabolism besides PgsA are es- 
sentially absent in the Caulobacterales lineage (Fig. 3E and table S6). 


Distribution of anaerobic traits in a selection of alphaproteobacteria 


Alphaproteobacteria lineage Taxon as NCBI entry 


Marine Alpha Alphaproteobacteria HIMB59 
Alphaproteobacteria TMED109 

Rickettsia conorii str. Malish - clade a 
Neorickettsia risticii strain Illinois 
Rhodospirillales super-order Geminicoccus roseus 

Tistrella mobilis 

Micavibrio aeruginosavorus 

Aliidongia dinghuensis 

Thalassobaculum multispecies 
Terasakiella magnetica 

Elstera cyanobacteriorum 

Azospirillum rugosum 

Arenibaculum pallidiluteum 

Nitrospirillum amazonense CBAmc 
Azospirillum lipoferum 4B 

Azospirillum picis 

Alphaproteobacteria SI037_bin135 o_UBA2966 
Alphaproteobacteria S1054_bin61 o_UBA2966 
Rhodospirillaceae ARS1032 o_UBA2966 
Rhodospirillaceae co234_bin8 cluster 1 
Rhodospirillaceae NP113 

Rhodospirillaceae S1039_bin34 
Rhodospirillaceae S1074_bin97 
Rhodospirillaceae S1074_bin47 cluster NosZ 
Oceanibacterium hippocampi 

Zavarzinia aquatilis 

Alphaproteobacteria M_DeepCast_65m_m2_129 
Kordiimonadaceae S1072_bin125 Emcibacterales 
Emcibacter nanhaiensis 

Paremcibacter congregatus 
Alphaproteobacteria M_MaxBin.053 Emcibacteraceae 
‘Odyssella’' sp. new MAG-112 
Rhodothalassium salexigens DSM 2132 
Alphaproteobacteria J084 

lodidimonas muriae 

Kordiimonas lipolytica 

Kordiimonas pumila 

Kordiimonas gwangyangensis 
Kordiimonadaceae NORP235 
Aestuariivirga litoralis 

Rhizobium sullae 

Brevundimonas mediterranea 

Caulobacter vibrioides CB15 
Sphingomonas/Rhizorhabdus witthicii 
Sphingogopyxis flava 

Paracoccus denitrificans PD1222 
Rhodobacter sphaeroides 2.4.1 


Rickettsiales 


NEW clade - UBA2966 


Oxycline 


SERIK - Sneathiellales 
SERIK - Emcibacterales 


SERIK - Rhodothalassiales 
SERIK - lodidimonadales 


SERIK - Kodiimonadales 


Rhizobiales 
Caulobacterales 
Sphingomonadales 


Rhodobacterales 


Q/RQ traits OFORs (not photosynthetic) 
q9 CoQ9 i OR (both types) 

h UbiA o OGOR & other OFORs 

t  UbiT p PFOR 

u UbiU 

v UbiV se complex |] chaperone 

k kynU 


Heatmap legend 


Fig. 4. Distribution of proteins for anaerobic traits in a selection of alphaproteobacteria. At least two representatives for each of the lineages of alphaproteobacteria 
considered here (Fig. 2) are presented in a compacted phylogenetic sequence from top to bottom. Several taxa are considered for Rhodospirillales and the SERIK group, as 
well as the lineage of new clades (2), because they present the highest concentration of OFORs traits (fig. S5), as well as of those for the anaerobic production of Q (2, 51, 
52)."0.5" indicates partial proteins. Traits boxed in black squares indicate the potential for de novo biosynthesis of RQ (see text). The abbreviations for the various traits are 
described in the legend boxes on the right. The traits of CoQ9 (q9) on the far left and of the chaperone of complex II (se) on the far right are reference common genes 


shared with mitochondria. 
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Considering the above evidence, it is likely that the Cls_pld type 
of cardiolipin synthase in Iodidimonas (table S6) predominantly 
produces the small levels of cardiolipin reported in this taxon 
(48). Note that the content in the inner mitochondrial membrane 
of several eukaryotes can amount to up to 20% of total lipids (49), a 
value much higher than that usually encountered in alphaproteo- 
bacteria. Energetically, the reactions catalyzed by the Cls_pld type 
or the Cls_AT are quite different. Whereas, in the Cls_pld-catalyzed 
reaction, no standard free energy is liberated, the standard free 
energy for the Cls_AT-catalyzed reaction is quite negative due to 
the hydrolysis of CDP—diacylglycerol to cytidine monophosphate 
(14, 37). The liberation of such free energy shifts the equilibrium of 
the Cls-AT reaction toward the cardiolipin product, thereby achiev- 
ing high concentrations of cardiolipin in the near absence of the 
phosphatidylglycerol precursor, which is much more concentrated 
in bacterial than in mitochondrial membranes (37, 40, 44). For this 
reason, Cls-AT type might have been selected to guarantee cardio- 
lipin levels under conditions of low availability of phosphatidylgly- 
cerol as in the marine environments where Iodidimonas spp. live 
(48). Notably, Bigyra taxa that also have both types of cardiolipin 
synthase live in similar marine environments (41), as the earliest eu- 
karyotes presumably did (10). These considerations would rational- 
ize why the presence of both types of cardiolipin synthase appears to 
be the ancestral state for cardiolipin metabolism in eukaryotes (41). 


Analysis of anaerobic traits along alphaproteobacteria 
lineages 

The previous analysis of kynU distribution has suggested that de 
novo synthesis of RQ may occur also in alphaproteobacteria. This 
would be an important novelty relevant to the bacterial ancestry of 
protomitochondria because RQ is one rare trait of anaerobic metab- 
olism that is shared by mitochondria and alphaproteobacteria (25, 
36, 50). Members of the Azospirillaceae family that have kynU and 
related genes of the kynurenine pathway (table S5) also have two or 
more genes encoding different forms of the prenyl-transferase 
(UbiA) protein catalyzing a critical step in ubiquinone (Q) biosyn- 
thesis (Figure 4). This is an infrequent occurrence that echoes a dis- 
tinctive feature of de novo RQ biosynthesis in Caenorhabditis 
elegans: the presence of a second UbiA protein that specifically 
transfers the isoprenoid tail to the ring precursor derived from 
the kynurenine pathway (36). Therefore, the presence of multiple 
genes for UbiA proteins in alphaproteobacteria that likely have 
the kynurenine pathway (boxed in Fig. 4) suggests that such bacteria 
may synthesize RQ by a de novo system equivalent to that of 
C.elegans. We additionally found that the genomes of several al- 
phaproteobacteria also have the ubiTUV genes for the anaerobic bi- 
osynthesis of Q (Fig. 4) (51, 52). These are Nitrospirillum 
amazonense, Arenibaculum, most members of the Azospirillum lip- 
oferum clade, Rhodothalassium spp., and four MAGs including 
Odyssella sp. NEW MAG-112 encountered before (Fig. 3B). 
Notably, the proteins encoded by ubiU and ubiV bind a 4Fe4S 
cluster promoting the oxygen-independent hydroxylation of the 
Q ring under anaerobic conditions (52). Under normal oxygen con- 
ditions, this hydroxylation is catalyzed by flavin hydroxylases such 
as UbiH (51). We have found paralogs of UbiU in Chlorophyta and 
other protists, often in fused proteins containing a similar, UbiV- 
like domain (see Materials and Methods for further details). It is 
thus possible that the ubiU-ubiV synteny has been transmitted by 
an alphaproteobacterial progenitor to primordial eukaryotes, 
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thereby rendering these genes additional markers for the anaerobic 
metabolism shared by facultatively aerobic bacteria and eukaryotes 
(2, 25, 51, 52). 

Next, we considered the distribution of 2-oxoacid:ferredoxin ox- 
idoreductases (OFORs) as established traits for anaerobic metabo- 
lism (7, 25). Five different types of OFORs (53) are found in the 
genomes of alphaproteobacteria, the long indolepyruvate:ferredox- 
in oxidoreductase being the most common (Fig. 4 and fig. S5A). 
The cumulative distribution of OFORs genes is uneven along the 
various lineages, with maximal concentration in the new clades 
(Fig. 4 and fig. S5). Eukaryotes adapted to anoxic conditions pre- 
dominantly have the pyruvate:ferredoxin oxidoreductase and the 
oxoglutarate:ferredoxin oxidoreductase in hydrogenosomes or 
other mitochondria-related organelles (MRO) (25). The genes for 
these proteins are concentrated in the phylogenetic space spanning 
the new clades and the SERIK group (2), which also shows high 
scores for aerobic traits (Fig. 4 and fig. S5). Hence, the distribution 
of the metabolic traits considered so far is not random among extant 
alphaproteobacteria, converging on a central phylogenetic region 
that may have the highest probability to be close to the ancestors 
of protomitochondria. To investigate this possibility, we used 
various discriminatory criteria derived from additional approaches 
(Table 2), which were selected from a large set of traits (Table 3) on 
the basis of discriminatory power (mainly derived from a relatively 
narrow distribution among alphaproteobacteria compared with a 
common presence in eukaryotes), monophyletic clustering of eu- 
karyotic proteins and independence from other approaches. 


Selection of candidate bacteria and relationships with 
phylogenetic evidence 

Table 2 lists alphaproteobacteria selected after an initial screening 
based on the presence of both types of cardiolipin synthase (Fig. 
3E and table S6), which were further evaluated with discriminatory 
criteria derived from different approaches, for example, the collin- 
earity in a bloc of ribosomal genes as in the mtDNA of Jakobida 
(15). Among the 20 bacteria that passed at least two of such criteria, 
Iodidimonas and related taxa passed the highest number, suggesting 
that these taxa may have a higher probability to match the metabolic 
profile of protomitochondria than other alphaproteobacteria (Table 
2). Detailed genomic analysis of isolated genes and various operons 
indicated that this finding was not correlated to peculiar features of 
gene insertion, transfer, or duplication in Iodidimonadales versus 
other alphaproteobacteria. Therefore, the results in Table 2 
sustain the novel possibility that the origin of protomitochondria 
was close to the phylogenetic space encompassing extant Iodidimo- 
nadales. This contrasts with the recent proposal that protomito- 
chondria might have originated from a lineage outside core 
alphaproteobacteria (1, 8, 54, 55). We believe that this discrepancy 
fundamentally derives from the complexities of sophisticated phy- 
logenetic analysis (12). 

The proposal that protomitochondria may be a sister group of 
core alphaproteobacteria is based on the strict interpretation of 
maximum likelihood (ML) phylogenies (1, 2, 8, 54), with the as- 
sumption that Magnetococci used for rooting the phylogenetic 
trees are not part of the alphaproteobacteria class (1, 8, 54). This 
phylogenetic placement of Magnetococci remains controversial 
(5, 8, 12, 56) and therefore affects the branching order of ML 
trees, which are known to produce different basal bifurcations 
when additional deep-branching sequences are included (57). In 
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Table 3. Other metabolic systems and traits analyzed in this work and their characteristics. NADH, reduced form of nicotinamide adenine dinucleotide; TCA, 
Tricarboxylic acid cycle; ATP, adenosine 5’-triphosphate; FOF1, FOF1 ATP synthase; MICOS, Mitochondrial contact sites and cristae organizing system; SQMO, 
Squalene Monooxygenase; MRO, Mitochondria Related Organelle. 


Traits Metabolic system Reference Taxonomic distribution and phylogenetic patterns for lodidimonas 
alphaproteobacteria (alpha) vs. eukaryotes 
>16 NADH-quinone oxidoreductase, (2, 8) Already analyzed in previous works. The largest membrane subunits have Yes 
complex l; TCA cycle also under strong phylogenetic signal but suffer from compositional artifacts due to 
anaerobiosis their hydrophobicity. Other subunits have variable signals, with some too 
short for providing valuable phylogenies. Constitutively present in 
basically all alpha* 
5 Succinate dehydrogenase, complex Il; This work Already analyzed in previous works. The two largest catalytic subunits Yes 
TCA cycle also under anaerobiosis and (8) have good phylogenetic signal, while the two membrane subunits are 
short and poorly conserved. Constitutively present in basically all alpha* 
212 FOF1 ATP synthase (8, 80) Already analyzed in previous works. The largest catalytic subunits have Yes 


strong phylogenetic signal, while most membrane subunits are short and 
poorly cons dic itutively present in all alpha* 


1 ETF-Q dehydrogenase This work Monophyletic eukaryotic clade. Constitutively present in basically Yes 
all alpha*. 
1 Sulfite oxidase This work Monophyletic eukaryotic clade but localized in peroxisomes in Yes 
Viridiplanta. Widely distributed among alphaproteobacteria. 
4 [Fe-Fe]hydrogenase and its assembly (25, 82) The mature hydrogenase localizes in MRO and also the cytoplasm of No 
factors - anaerobic metabolism various anaerobic eukaryotes, which might have acquired the traits via 
LGT from different bacteria. Very limited distribution in alpha. 
2 NirBD for nitrate assimilation (11) The pathway is not localized in mitochondria. Predominantly present in No 
Opistokhonts forming polyphyletic eukaryotic clades with intermixed 
bacteria. 
1 Mic60 This work and Already analyzed in previous works. It may have a different function in Yes 
(8, 33) bacteria (part of the hem operon) and mitochondria (part of the MICOS 
system regulating cristae). Poor conservation of bacterial proteins. 
2 Ftsy and Fhp for membrane sorting This work Only Fhp produces a monophyletic eukaryotic clade. Distribution is Yes 
and (83) limited to a small set of eukaryotes. 
28 Aerobic biosynthesis of ubiquinone This work Proteins are poorly conserved and some have yet unrecognized Yes 
and (57) eukaryotic counterparts. Both polyphyletic and monophyletic eukaryotic 
clades. Constitutively present in all alpha* without parasitic lifestyle. 
>4 Biosynthesis of sterols This work The pathway is not localized in mitochondria. SQMO catalyzes the first Partially 
and (70) oxygen-dependent step but produces polyphyletic eukaryotic clades with 
intermixed bacteria. Moreover, this and other enzymes for sterol 
biosynthesis are scarcely present in alpha. 
>4 Biosynthesis of phospholipids: PC PI This work The pathways are not localized in mitochondria and include members of Partially 
and PS and (40) the CDP-AT superfamily related to enzymes evaluated here for cardiolipin 
metabolism. Poor conservation between alpha and eukaryotes. Scattered 
distribution among alpha. 
1 CTP synthase (80) Nonmonophyletic clade when most eukaryotes are considered. Yes 
Constitutively present in most alpha*. 
4 Nuclear encoded respiratory proteins, This work The eukaryotic proteins generally produce monophyletic clades. In this Yes 
similarity by BLAST and (58) work, BLAST searches were computed for some proteins as described in 


Materials and Methods. The results of MPPbeta are presented in fig. S2C. 


*In principle, traits that are constitutively present in most alphaproteobacteria cannot provide discriminatory criteria for selecting relatives of protomitochondria, 
contrary to the traits in Tables 1 and 2. 


EZOZ ‘PO 1940190 UO B1O‘a0UNTOS' MMM//:sdyYy Woy papeojuMog 


our experience, the addition of proteins from members of the Ma- 
rineProteol lineage to taxonomically broad alignments of complex 
I subunits produced ML trees in which these proteins formed a 
sister clade to that of core alphaproteobacteria (2). When mitochon- 
drial proteins were also added, they formed a clade branching 
between the basal clade of MarineProteol and that of core proteo- 
bacteria (2), thus reproducing previous results (1, 8). Without the 
proteins from MarineProteol, the same alignments often generated 
ML trees in which the mitochondrial clade was sister to that of 


Geiger et al., Sci. Adv. 9, eadh0066 (2023) 9 August 2023 


Rickettsiales (2, 7), most likely due to compositional bias and 
other artifacts (8). This situation is common to many proteins de- 
fining the traits examined here, while several of such proteins are 
generally not present in the genomes of MarineProteol taxa (Figs. 
2 and 3 and tables 3 to 5). Published phylogenies with MarinePro- 
teol were constructed without proper representation of proteins de- 
fining aerobic traits (1, 54), reflecting instead the dominant signal of 
complex I proteins. This distortion was not adequately balanced by 
adopting a much larger set of protein markers (8) (see Material and 
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Methods for details). Hence, the critical issue of an appropriate 
choice of protein markers and taxonomic sampling has remained 
unsettled. 

We addressed this issue further by using phylogenetic trees re- 
constructed with the COX3 protein, which has a relatively good 
phylogenetic signal (2, 11) and does not show mitochondria segre- 
gating in a sister clade to core alphaproteobacteria (fig. S6). More- 
over, the molecular history of COX3 involved the N terminal fusion 
of a domain containing two TM that might derive from ancestral 
COX] proteins [Fig. 1A cf. (58)]. This fusion left a large insert 
between the second and third TM of the 7TM COX3 of various bac- 
teria (59), which is common among the proteins of gammaproteo- 
bacteria such as Nitrococcus. However, this insert was lost in the 
COX3 proteins of basal alphaproteobacteria such as Caenispirillum 
and other members of the Rhodospirillaceae family (56), which 
have a simple variant of the COX operon without CtaB and 
SURF1 [Fig. 5 cf. (11)]. Subsequently, COX proteins acquired one 
to three different inserts—or more appropriately INsertions and 
DELetion (INDEL)s (59)—along the evolution of alphaproteobac- 
teria, around similar positions but with different sequence length 
and signatures with respect to those of gammaproteobacteria (fig. 
S6). These differences indicate independent acquisition of 
INDELs. Mitochondrial COX3 does not have any of these 
INDELs, and therefore, it is close to the ancestral COX3 of basal 
alphaproteobacteria. Nevertheless, the crown of the mitochondrial 
clade of COX3 has a stem length comparable to that of diverse al- 
phaproteobacterial lineages. For example, the relative distance of the 
crown encompassing Rhodothalassiales, Iodidimonadales, and 
Kordiimonadales (RIK) is 1.08 + 0.08 with respect to crown mito- 
chondria—n = 15, hence not significantly different (Fig. 5 and 
fig. S6). 

The branching pattern just described changed completely when 
phylogenetic trees were reconstructed with the COX3 proteins of 
the alphaproteobacteria taxa in Table 2 (Fig. 5), which have been 
selected by criteria completely different from those used in recent 
papers (1, 6-8). Now, the mitochondrial clade is the latest branching 
and often clusters with proteins of members of the SERIK and new 
clades. Caenispirillum COX3 always forms the basal branch, gener- 
ally followed by that of Tistrella, which has a different variant of 
COX operon (Figs. 1 and 5). The tree in Fig. 5, therefore, offers 
an alternative interpretation for the possible evolution of COX3, 
an essential component of the aerobic metabolism shared by bacte- 
ria and mitochondria. The mitochondrial proteins likely evolved 
after separation from the ancestor of the alphaproteobacterial line- 
ages of SERIK, Rhizobiales and Rhodospirillales, but maintained 
the absence of INDELs characteristic of ancestral COX3 such as 
that of Caenispirillum. Among the above lineages, only Iodidimo- 
nadales retain a COX3 without INDELs, even if the branching 
pattern is not resolved enough to indicate a direct connection 
with basal branches. Nevertheless, in trees such as shown in Fig. 
5, the crown distance of mitochondrial COX3 is significantly 
longer than that of Iodidimonadales (relative mean distance 0.82 
+ 0.05, n = 9, P < 0.01; inset in Fig. 5). This evidence supports the 
possibility that the ancestor of extant Iodidimonadales was at least 
contemporary with that of protomitochondria. Although devoid of 
catalytic centers, COX3 plays a fundamental role in regulating the 
oxygen affinity of mitochondrial complex IV (11, 60). Therefore, 
the molecular history of this protein subunit bears relevance to 
the evolution of aerobic metabolism in eukaryotes. 
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Together, our results indicate that very few alphaproteobacteria 
would be selected by combining various discriminatory criteria 
(Table 2) for identifying the possible ancestors of protomitochon- 
dria. Iodidimonas spp. and relatives have a superior probability than 
other possible candidates because they pass more discriminatory 
criteria than other bacteria (Table 2). Considering our results com- 
bined with phylogenetic data (Fig. 5 and fig. S6), we hypothesize 
that deep-branching MAGs of the Iodidimonadales such as alpha 
J134 [Fig. 1C, cf. (61)] may define the phylogenetic space from 
which protomitochondria originated. Alpha J134 and other 
MAGs clustering with Iodidimonadales live in ecological niches 
comparable to those present in Proterozoic oceans (10, 61). In 
turn, they are phylogenetically close to various MAGs thriving in 
marine zones with oxygen gradient (2, 61)—the kind of environ- 
ment that may be nearest to that pervading Proterozoic oceans 
when protomitochondria evolved (10, 25). Therefore, our data 
and insights dovetail with the emerging picture that a facultative 
aerobe was the likely bacterial ancestor of protomitochondria (10). 


MATERIALS AND METHODS 

The aim of the project leading to this paper was to unveil the met- 
abolic ancestry of protomitochondria (https://osf.io/t9qze/, ac- 
cessed on 7 April 2023). To achieve this, we have followed diverse 
approaches that have different sources of bias (table S1) (62) and 
provide alternative data for the study of the bacterial origin of mi- 
tochondria mainly because they focus on shared metabolic traits. 
The fundamental approach used previously was phylogenetic infer- 
ence (1-8) based on increasingly wider sets of proteins shared by 
alphaproteobacteria and mitochondria (1, 8). The contribution of 
complex I protein subunits was predominant in the set of "24 alpha- 
mitos COG" introduced by Ettema and coworkers (1, 54) since 36% 
of all proteins and more than 50% of the amino acids in the concat- 
enated alignments belonged to complex I subunits. We have not 
considered the protein subunits of complex I (reduced form of nic- 
otinamide adenine dinucleotide-ubiquinone oxidoreductase) nor 
those of complex II (succinate dehydrogenase) because these 
enzyme complexes are present in basically all alphaproteobacteria 
(Table 3), including those that do not have a central metabolism 
equivalent to that of aerobic eukaryotes (11, 53). We have excluded 
other traits that are shared by eukaryotes and alphaproteobacteria 
for similar and other reasons, listed in Table 3. The primary focus 
of our work was on aerobic metabolism, which has been undereval- 
uated recently because the protein traits defining this metabolism 
have been systematically underrepresented: 11.1% of all proteins 
in the expanded list of (8) and 16.7% in the 24 alphamitos COG 
set (1). 


Choice of aerobic traits and their quantitative analysis 

The cytochrome part of the respiratory chain forms the core of the 
aerobic metabolism of mitochondria (Fig. 1) and in bacteria is con- 
tributed by 25 proteins that are mostly shared with eukaryotes: 10 in 
the COX operon (Fig. 1A) and separate genes for Cox15, Synthesis 
of Cytochrome c Oxidase (SCO), and Cox5B (Table 1); 9 for the 
Ccm system (system I) of cytochrome c maturation (Ccm) (63); 6 
for complex III, including the Cbp3 chaperone (24) and the two 
MPP-related CPs (20); and 1 for the soluble cytochrome c that func- 
tions as an electron acceptor for complex III and electron donor for 
complex IV. Several homologs of this soluble cytochrome are often 
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Fig. 5. Phylogenetic tree of COX3 proteins from the alphaproteobacteria of Table 2. The figure shows a representative of several ML trees obtained with an align- 
ment of 46 COX3 proteins and different models of programs (IQ-Tree, Phy-ML3, and MEGAS). The manually curated alignments included three outgroup sequences from 
gammaproteobacteria such as Nitrococcus (Fig. 1A), 24 from alphaproteobacteria (most of those in Table 2, plus Caenispirillum salinarum and Tistrella bauzanensis to 
strengthen the clades of Caenispirillum and Tistrella, respectively), and also a representative of the new clade (2) and 19 mitochondrial sequences from the eukaryotic taxa 
listed in table $3. In all cases, the alignments had 317 amino acid sites. See fig. S6 for a much larger ML tree including most taxa examined here. COX operon variants of 
various alphaproteobacteria are inserted on the right of the respective clade, rendered as in Fig. 1A. The stem of the crown group of lodidimonadales plus Kordiimonas 
(lodo-Kordi in the inset of the left) and that of Rhizobiales are indicated by the orange bar. The length of these stems has been normalized to that of crown mitochondria 
(brownish bar) to evaluate the relative distance to crown mitochondria, proportional to divergence times estimated with BaYesian inference (78, 79); the mean values plus 
SD of the relative distance data are presented in the inset, with darker colored histograms representing data from n = 15 ML trees containing many more alphaproteo- 
bacterial COX3 proteins as in fig. S6. The pink circles annotate the branches with COX3 proteins devoid of the INDELs that are scattered among alphaproteobacteria (59), 
as shown in fig. S6. 


present in bacterial genomes; here, we systematically selected the 
proteins showing the highest homology to mitochondrial cyto- 
chrome c of Andalucia and other early-branching eukaryotes. We 
found no eukaryotic homologs for two proteins of the Ccm 
system, i.e. CcmD and Ccml, while CcmG is too similar to 
various redox proteins for identifying specific eukaryotic homologs. 
Moreover, CcmB is extremely similar to CcmA and therefore could 
not be considered as a separate protein trait, as in the case of MPPal- 
pha versus MPPbeta of complex III. Last, we excluded CtaB/Cox10 
because it is not encoded in any mtDNA nor is part of plant or 
protist complex IV. The total number of remaining proteins 
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considered as separate aerobic traits was around 20, to which we 
added: alternative oxidase (AOX), the only terminal oxidase other 
than COX in eukaryotes (64); multiple forms M16A and M16C pep- 
tidases that frequently occur in eukaryotic genomes [(28, 65); www. 
ebi.ac.uk/merops/cgi-bin/famsum?family=M 16, last accessed on 7 
April 2023); a Cys signature toward the C terminus of COX1 
lying in a conserved region at the negative side of the membrane, 
which is common in taxa with reduced genomes such Rickettsiales, 
more prone to differential gene loss than other lineages (6). We then 
constructed a preliminary presence/absence table for all the above 
traits in our selection of alphaproteobacteria and eukaryotes, 
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detailed below, assigning a value of 1 for complete and 0.5 for in- 
complete sequences. A —1 penalty was given for the absence of 
COX1 and COX2 of Al type COX, the type of catalytic subunits 
of mitochondrial complex IV which must be present in possible an- 
cestors of mitochondria (11). At difference with previous presence- 
absence analyses (2, 7, 11, 15, 16, 54), we further assigned a 
premium score of 2 for the following syntenic associations: (i) 
COX1-COX2, forming a collinearity in the mtDNA of Andalucia 
(15) and marine members of the TSAR super-group (66)—they 
are also fused together in the mtDNA of Amoebozoa (11); (ii) 
Cox11 and COX3, which are syntenic in the mtDNA of Andalucia 
(15) and other early-branching eukaryotes (fig. S1A); and (iii) 
M16B and ISP, which form a rare synteny (Fig. 1). We assigned a 
value of 1 to the genes for M16B.016 peptidases closest to eukaryotic 
MPP (23)—all part of the sister clade of alphaproteobacteria pro- 
teins (65) which were identified by specific signatures in the 
aligned sequences—when such genes were associated to the COX 
operon, either directly or intermixed with Tsy or M32 (Fig. 1). A 
value of 0.5 was given for genomic associations separated by three 
or less other genes from that of SURF1 (Fig. 1B). In addition, we 
assigned a premium of 1 whenever the gene for CcmfF, the critical 
heme lyase of System I for cytochrome c biogenesis (63), was sur- 
rounded by at least two other Ccm genes, which generally did not 
correspond to those considered as separate traits here (Table 1). 
Multiple genes for the same trait within a genome were generally 
quantified with a numeral equivalent to such genes. Bacterial 
SCO and eukaryotic AOX showed such relatively uncommon situ- 
ations (table S3). Conversely, multiple genes for M16A and M16C 
peptidases frequently occur in eukaryotic genomes (28). In such 
cases, we capped the maximal numeral at 5 (Table 1). In sum, the 
cumulative score of the various aerobic traits that we have analyzed 
here provides a multilayered and exhaustive evaluation of the 
aerobic metabolism of bacteria and eukaryotes that has never 
been considered before. To verify how the overall profile of the cu- 
mulative aerobic traits was influenced by the total number of differ- 
ent combinations of such traits, we carried out comparative plots of 
the median values. An example of these comparisons is presented in 
fig. S2A, showing a similar overall profile along the alphaproteobac- 
terial lineages we have selected to represent the whole class (see 
below). Using all 23 traits produced a more flattened distribution 
of values than in our previous analysis of 18 traits (fig. S2A), report- 
ed in the BioRxiv precursor version of this paper (59). We then se- 
lected the intermediate set of 20 traits listed in Table 1 to represent 
the quantitative distribution of aerobic metabolism (Fig. 2). 


Taxonomic sampling of mitochondria and grouping of 
alphaproteobacterial lineages 

We applied our approaches to quantify the distribution of aerobic 
and other metabolic traits by making a thoughtful selection of all 
currently available genomes of alphaproteobacteria (including 
Magnetococcales), lithotrophic gammaproteobacteria, and eukary- 
otes with aerobic mitochondria. We excluded other bacteria 
because they do not have the subtype of COX operon that is char- 
acteristic of alphaproteobacteria and, in part, of mitochondria from 
early-branching eukaryotes [Fig. 1, cf. (11)]. Moreover, the MPP 
proteins of eukaryotes have been recently reported to be closely 
related to the M16B proteins of alphaproteobacteria (65), confirm- 
ing our previous results (59). The selection of the genomes for our 
analyses was based on extensive efforts to thoroughly deal with the 
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issue of taxonomic sampling, most critical for evaluating the bacte- 
rial origin of mitochondria (1-8, 32, 54). The initial efforts focused 
on eukaryotic taxa that have aerobic mitochondria and can be con- 
sidered early branching, in the sense that they occupy basal branch- 
es in the various super-groups of eukaryotes (45) or have a large 
number of protein-coding genes in their mtDNA (15, 67, 68). 
The latter feature was particularly important in our selection 
because of the limited number of available mtDNA genomes that 
code at least one Ccm gene for cytochrome c biogenesis, a critical 
precondition for undertaking a balanced comparison in the aerobic 
traits of alphaproteobacteria and mitochondria. More than 95% of 
alphaproteobacterial genomes have several genes of the Ccm 
operon, which has been subsequently vertically inherited by eukary- 
otes (15, 16, 63). Four or less of such genes are coded in the mtDNA 
of the following eukaryotes: plants, Jakobida (15, 17), Diphylleia 
(16), Palpitomons (68), and Cryptista such as Microheliella, Ancor- 
acysta, Malawimonadida, and Cyanidiales among Rhodophyta 
(69).With the exception of Ciliophora such as Tetrahymena (70), 
other major super-groups of eukaryotes have System III for cyto- 
chrome c biogenesis (63, 71), consisting of a single heme lyase 
without bacterial precedents. We therefore considered only a few 
taxa, including Techamonas, that have the eukaryotic innovation 
of system III (50, 53) and none having its possible precursor, 
system V restricted to Euglenozoa (71). After detailed analysis, in 
part, used for other aspects of this work, we also excluded Rhodel- 
phida (69) because their genome is incomplete and does not encode 
the Ccm system. Considering the limited number of nuclear 
genomes that are currently available for Discoba and Archaeplasti- 
dia other than plants, we settled to analyze a set of 20 eukaryotes 
predominantly having Ccm proteins (85%) to represent aerobic mi- 
tochondria (table S3). Permutations of some taxa of this set with 
other eukaryotes, either having Ccm proteins or lacking them as 
Bigyra (SAR), did not fundamentally alter the distribution of the 
aerobic traits in mitochondria and their cumulative score, which 
maintained a median value around 24 (Fig. 2). 

Next, we endeavored to produce a thorough selection of 
alphaproteobacteria representing the major lineages of the class 
with their specific metabolic traits. Recognizing that it is indispens- 
able to consider the spectrum of alphaproteobacteria diversity to 
evaluate their possible relationships with protomitochondria, we 
built the import repository of 314 genomes (publicly available at 
https://osf.io/t9qze/) representing all the major lineages of the 
class (2) plus the group of MarineProteol (1). The genomes were 
systematically reannotated for all the genes encoding proteins 
involved in aerobic metabolism and other metabolic pathways 
examined. Iterative Position-Specific Iterated BLAST (PSI- 
BLAST) searches of representative proteins (2, 58) were carried 
out to evaluate their completeness and re-annotation congruity. 
Our previous taxonomic analysis (2) guided the selection of 15 sep- 
arate lineages of alphaproteobacteria, spanning early-branching 
Rickettsiales to late-branching Rhodobacterales. Intermediate line- 
ages also corresponded to orders in the current taxonomy of 
alphaproteobacteria (www.ncbi.nlm.nih.gov/Taxonomy/Browser/ 
wwwtax.cgi?id=28211, accessed on 11 April 2022), which essential- 
ly derives from a recent reclassification (56). However, we consid- 
ered as separate lineages two groups that most likely belong to the 
Rhodospirillales superorder (2) due to phylogenomic and other ob- 
servations, as follows. HIMB59 and related marine MAGs were 
clustered together with the lineage called “MarineAlpha,’ which 
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includes other marine bacteria with similarly AT-rich genomes, in 
particular those classified under the TMED109 and TMED127 
orders in GTDB taxonomy (72). Holosporales, which generally 
cluster with clades of Rhodospirillales after correcting for their 
AT-rich genomes (7, 8), was considered an intermediate lineage 
between Rickettsiales and Rhodospirillales, a placement sustained 
by their low cumulative scores of aerobic traits (Fig. 2). We used 
around 60 taxa to encompass the wide genomic diversity of the Rho- 
dospirillales superorder, including at least two taxa for each major 
subdivision (2). We routinely kept the recently identified new clades 
of marine MAGs as a separate lineage from related Sneathiellales 
(including also Minwuiales) because of the diversity in their bioen- 
ergetic traits (2). We also kept the lineage of Iodidimonadales sep- 
arate from other SERIK groups (2) essentially because of their 
unique synteny of M16B-ISP (Fig. 1 and fig. S1). To increase the 
genomic diversity of lodidimonadales, we included two alphapro- 
teobacterial MAGs found in an anoxic hydrothermal niche (61), 
proteins of which cluster together with those of Iodidimonas in 
phylogenetic trees (for example, COX3 in fig. S6). We have also 
re-evaluated the raw metagenomic data from (61) to expand the 
genome completeness of alpha J134, as described below. Converse- 
ly, we merged the order of Rhodothalassiales with Kordiimona- 
dales, as well as that of Zavarziniales with Emcibacterales, by 
considering common features such as the number of INDELs in re- 
spiratory proteins (59). Representatives of all major families were 
included to encompass the genomic diversity of Rhizobiales (2, 
56). In contrast, only the deepest branching and major genera 
were included to represent the limited taxonomic diversity of the 
lineages of Pelagibacterales, Sphingomonadales, Caulobacterales, 
and Rhodobacterales. The Pelagibacterales lineage was placed just 
before the Rhizobiales lineage in accordance with recent phyloge- 
netic analyses (2, 56). We did not consider the orders of Parvular- 
culales and Maricaulales because they contain late-branching taxa 
that often cluster with representatives of the established orders of 
Caulobacterales or Rhodobacterales (2, 56). However, we consid- 
ered members of the Micropepsales (72, 73) in some analyses. We 
found that a small group of MAGs listed among Micropepsales in 
GTDB Release 07-RS207 (8 April 2022) (https://gtdb.ecogenomic. 
org/searches?s=gt&q=o__Micropepsales, accessed on 18 August 
2022) clustered within the clade of Rhizobiales, having similar 
sets of aerobic traits (fig. S2C). In all cases, the composition of 
the 15 alphaproteobacterial lineages was balanced to provide a 
good match in the relative frequency of bioenergetic traits with 
respect to the complete set of available genomes, as previously 
reported for NosZ (2). We verified that substituting several taxa 
listed in table S3 with other taxa of the most diverse lineages, Rho- 
dospirillales and Rhizobiales, hardly changed the median cumula- 
tive score of aerobic traits. Although we favored the inclusion of 
cultivated taxa with a complete genome, in several cases, this was 
not possible because the lineage included only MAGs, as in the 
case of MarineProteol (1, 8). In such lineages, we considered 
MAGs with the most complete genomes, as deduced from the 
GTDB database or our own evaluations conducted as described 
earlier (2). MarineProteol_Bin1 was kept despite its poor coverage 
because of its prototypic position in the MarineProteol1 lineage (1, 
8). In sum, the lineage-dependent plots of various traits we present 
in this work reflect divergence time along the x axis, essentially fol- 
lowing the consistent branching order of currently known alphap- 
roteobacteria (2). 
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Analysis of genomic data 

We used the genomic region from Iodidimonas muriae whole- 
genome assembly [National Center for Biotechnology Information 
(NCBI) accession number: BMOV01000000, contig 
BMOV01000004] as a reference to find clusters of genes equivalent 
to those of Fig. 1A in the metagenomic data of (61), reported under 
NCBI Bioproject PRJNA392119. First, we manually inspected the 
reported genome of I. muriae (74) to verify the correctness of its 
annotation, detecting one artifact from the annotation process (ac- 
cession GGO11109.1), while another gene within the COX operon 
(accession GGO11122.1) was reannotated as a DUF983 domain- 
containing protein. Three MAGs found to cluster with Iodidimona- 
dales (Fig. 5 and fig. S6) were downloaded: RFID00000000.1 (J067), 
RFKS00000000.1 (J134), and RFIU00000000.1 (J084). Using the 
available raw data, each metagenome was reassembled using 
SPADes v3.14.1 with default parameters. The obtained assembly 
was used as a database to search for the genes of the clustered 
COX-M16B-bcl operons of I. muriae (Fig. 1A), using the NCBI 
tBLASTN suite with default parameters. The selected contigs thus 
obtained were used for manual inspection and curation using the 
genome browser Artemis v18.1.0. Among the contigs examined, 
contig RFKS01000109 of alpha J134 was found to contain the se- 
quence of genes starting from partial COX1, essentially as previous- 
ly reported (61). To improve and fully reconstruct the genomic 
region of interest, a metagenomic reassembly was performed 
using the Sequence Read Archive (SRA) toolkit with the fasterq- 
dump program and --split-files flag to obtain the original fastq 
files in four reported samples (SRR7905022, SRR7905023, 
SRR7905024, and SRR7905025) (61). From the resulting new as- 
sembly, we extracted contig 829 which was 33.2 kb long and con- 
tained the complete series of the I. muriae genomic cluster, plus a 
DNA insertion after the M16B gene that might encode for a hypo- 
thetical protein (Fig. 1C). We could not firmly validate this possi- 
bility since contig RFKS01000109 included a different DNA 
sequence after the M16B gene. However, we later found that the re- 
cently reported MAG, Rhodothalassiaceae bacterium KatS3mg119 
(75), presents the same gene sequence uncovered for J134 (Fig. 1C), 
with the insertion of an aryl esterase gene between M16B peptidase 
and ISP. Notably, the proteins of this MAG closely cluster with 
those of J067, which is early branching among Iodidimonadales. 
The genome assembly of J067 and related J084 (61) was more frag- 
mented than that of J134; even if partial gene clusters of the COX- 
bc1 operons were present in some fragments, the complete synteny 
with the I. muriae cluster (Fig. 1A) could not be confirmed. 


Distribution of the traits for ceramide and kynurenine 
biosynthesis 

The second approach that we used followed a conventional absence- 
presence analysis (table S5). We first conducted PSI-BLAST search- 
es of putative homologs of Caulobacter SPT (29, 31) in all the 
genomes of our in-house repository, plus those of closely related 
taxa. An E value cutoff of 5 x 107° was generally sufficient to differ- 
entiate genuine SPT homologs from related a-oxoamine enzymes 
such as 5-aminolevulinate synthase. The genomic regions contain- 
ing the gene encoding the identified SPT homologs were then care- 
fully inspected to verify the presence of the four-gene operon that is 
common in ceramide-synthesizing bacteria (Fig. 3A) and the even- 
tual vicinity of other genes required for sphingolipid biosynthesis 
(31). Each protein of the operon was then analyzed in detailed 
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alignments to identify its biochemical nature (table S5). Various 
bacterial SPT proteins were then used in PSI-BLAST searches ex- 
tended to eukaryotes, focusing on the species selected for the com- 
parative analysis of the aerobic traits (table S5). We considered both 
the active and inactive isoforms of eukaryotic SPT, which likely 
derive from a process of duplication and differentiation of the 
single spt gene of bacteria (Fig. 3B) (30). An equivalent strategy 
was used for identifying bacterial and eukaryotic homologs of Pseu- 
domonas kynureninase, encoded by the kynU gene which is often 
adjacent to kynA and other genes for components of the kynurenine 
pathway (35, 36). 


Analysis of enzymes involved cardiolipin metabolism 

Cardiolipin or diphosphatidylglycerol is a typical membrane lipid of 
prokaryotes that is present almost exclusively in the inner mem- 
brane of mitochondria (14, 37-42). In this work, we have tackled 
the complicated issue of the evolutionary pathways of cardiolipin 
biosynthesis and metabolism (14, 37, 40). The issue is complicated 
because there are two fundamentally different enzymes that directly 
synthesize cardiolipin, which have a scattered distribution in pro- 
karyotes and eukaryotes (14, 37, 41). We compiled a detailed anal- 
ysis of the distribution of enzymes for cardiolipin biosynthesis in 
alphaproteobacteria (fig. S5C and table S6). After finding some al- 
phaproteobacteria taxa that have both types of cardiolipin synthase, 
Cls-AT and Cls_pld (Fig. 3E and table S6), we realized that our 
results would support a previous proposal (41): the ancestral state 
of early eukaryotes might have contained both types of the synthase. 
We thus undertook a detailed analysis of the following enzymes in- 
volved in cardiolipin metabolism: (i) PGP synthase, PgsA (44); (ii) 
cardiolipin synthase of the CDP-AT type, Cls-AT; (iii) cardiolipin 
synthase of the PID type, Cls_pld, and; (iv) cardiolipin-specific 
phospholipase of the cld1 type, typical of yeast (39) but with clear 
homologs in alphaproteobacteria, as verified by phylogenetic anal- 
ysis. Given the variety of CDP-AT proteins present in some alphap- 
roteobacterial genomes, especially in MAGs belonging to the 
Rhodospirillales "superorder," and considering the limited differ- 
ences between PgsA and Cls-AT (37), we undertook detailed se- 
quence analysis using manually curated alignments of all these 
proteins, taking into consideration known three-dimensional fea- 
tures from available structures (43, 44). We were able to distinguish 
homologs to PgsA by the presence of a conserved Arg residue that is 
involved in PGP binding—R108 in the structure of Staphylococcus 
PgsA (44), indicated by the red arrow in the alignment block of fig. 
S4B. Cls-AT proteins from either bacteria or mitochondria substi- 
tute this Arg with hydrophobic amino acids (fig. $4B), while pro- 
teins of other families of the CDP-AT superfamily have different 
sequence features in the same protein region (44). On the basis of 
these molecular differences, we could consider the CDP-AT pro- 
teins with elongated N terminus found in Iodidimonadales (table 
S5) as possible relatives of Cls-AT. Phylogenetic analysis further 
confirmed this probable relatedness (Fig. 3D), indicating that 
such CDP-AT proteins with elongated N termini may represent 
an ancestral form of A family CDP-AT including bacterial and mi- 
tochondrial Cls-AT (44). We cross-checked the genomic distribu- 
tion of cardiolipin-synthesizing enzymes with the documented 
presence of cardiolipin in bacteria, obtaining a good correspon- 
dence between the absence of such genes (table S5) and the lack 
of cardiolipin in Caulobacter spp. (47). Similar good correlations 
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were obtained with alphaproteobacterial lineages such as Rickett- 
siales (37), for which no biochemical evidence of cardiolipin exists. 


Analysis of anaerobic traits 

We have expanded the analysis of the kynurenine pathway to eval- 
uate whether such a pathway might be involved in de novo biosyn- 
thesis of RQ (36) in alphaproteobacteria. We cross-checked this 
analysis with the evaluation of the taxonomic distribution of the 
rquA gene found in Rhodospirillum rubrum, which enables the con- 
version of RQ from preformed Q (50). We found rquA distribution 
to be mutually exclusive with the combination of two genes for 
UbiA plus the kynU gene adjacent to other genes for the kynurenine 
pathway (see text, cf. Fig. 4). This mutual exclusivity has been ver- 
ified in the genomes of the genus Rhodoferax known to contain RQ 
(76). For example, Rhodoferax fermentans has rquA (protein acces- 
sion WP_078364963) but not kynU, whereas Rhodoferax sediminis 
has the kynU gene for a functional kynureninase (protein accession 
WP_142820272) but no rquA homolog. The presence of the 
ubiTUV triad of genes required for the anaerobic biosynthesis of 
Q (52) was identified by the genomic collinearity of these genes 
and protein alignments defining the distinctive signatures of the 
coded proteins (2), including the conserved Cys ligands of the 
[4Fe4S] cluster (51, 52). BLAST searches extended to eukaryotes 
identified diverse paralogs with the same U32 peptidase domain 
as UbiU (52), including composite proteins showing the fusion of 
this domain with the DUF3656 domain as in the protein coded by E. 
coli rlhA (52, 77), often in combination with a similar, UbiV-like 
domain fused at the C terminus. Most frequently, such proteins 
were found among members of the SAR super-group (45) and 
Chlorophyta; however, similar proteins are present in Magnetococ- 
cales and other deep-branching alphaproteobacteria. Eukaryotic 
protein hits found among invertebrates were identified instead as 
likely bacterial contaminants on the basis of dedicated BLAST 
searches. Identification of the different types of OFORs was under- 
taken following the molecular analysis described earlier (53). Their 
presence was quantified assigning a value of 1 to each monomeric or 
heterodimeric type and of 0.5 whenever a protein was incomplete. 
OFORs involved in the biosynthesis of photosynthetic pigments 
were ignored. 


Phylogenetic and molecular analysis of proteins 
Phylogenetic analysis was conducted with large alignments of 
diverse protein sequences (2, 58). Such alignments were manually 
implemented after rounds of automated alignment with the 
MUSCLE program within the MEGA software versions 5 and X 
(2, 58). Short gaps that were specific for one or a few proteins 
were deleted, while the N and C termini were minimally trimmed 
to preserve potential signatures. Phylogenetic analysis was under- 
taken with ML inference using the program IQ-Tree and BaYesian 
inference using the BEAST program or neighbor-joining with the 
MEGAS program as previously described (58). The Le and 
Gascuel (LG) and Whelan And Goldman (WAG) models were 
most frequently used for tree reconstruction. Calculation of the rel- 
ative stem length of crown groups in phylogenetic trees was under- 
taken by a modification of the methods reported recently (78, 79), 
normalizing the distance values from the root branch to the value of 
the mitochondrial clade = 1. See the legend of Fig. 5 for 
further details. 
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Similarity analysis with top alphaproteobacterial hits in 
BLAST searches 

To provide an independent quantitative approach to selecting al- 
phaproteobacteria taxa for the vicinity of their respiratory proteins 
to mitochondrial homologs, we undertook systematic searches with 
PSI-BLAST in the NCBI database (https://blast.ncbi.nlm.nih.gov/ 
Blast.cgigPROGRAM=blastp, accessed on 24 August 2022). The 
searches were extended to all alphaproteobacteria, including unclas- 
sified and incertae sedis. The top five hits, which provided a good 
balance between specificity and limited noise in the similarity 
data (80), were tabulated and the frequency of multiple hits was 
computed for each bacterial taxon. This analysis focused on the 
nuclear-encoded proteins of the COX-M16B/MPP-bcl gene 
cluster (Fig. 1A). We found that the results of MPPbeta queries 
showed the largest presence of hits (43%) among the alphaproteo- 
bacterial taxa previously selected for other analyses (table S3). They 
are presented in fig. S2C (orange symbols, each representing an in- 
dividual taxon). The alphaproteobacteria in this figure included 
some Micropepsales (73) with discrete frequencies of hits. 


Statistical analysis 

Statistical analysis was conducted with the 99% confidence interval 
of the t test (81). In particular, we conducted independent t tests to 
verify the two-tailed hypothesis that the cumulative scores of the 20 
aerobic traits (Table 1) for each lineage significantly differed from 
mitochondria. To guard against inflated type 1 errors (i.e., false pos- 
itives) from multiple testing, these analyses were conducted at the 
more stringent 1% significance level (alpha = 0.01). All analyses 
were run in R (version 4.1.0) using the stats package. 
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